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In order to survive, reproduce and (in multicellular organisms) differentiate, cells must control the 
concentrations of the myriad different proteins that are encoded in the genome. The precision of this 
control is limited by the inevitable randomness of individual molecular events. Here we explore how 
cells can maximize their control power in the presence of these physical limits; formally, we solve 
the theoretical problem of maximizing the information transferred from inputs to outputs when the 
number of available molecules is held fixed. We start with the simplest version of the problem, in 
which a single transcription factor protein controls the readout of one or more genes by binding to 
DNA. We further simplify by assuming that this regulatory network operates in steady state, that 
the noise is small relative to the available dynamic range, and that the target genes do not interact. 
Even in this simple limit, we find a surprisingly rich set of optimal solutions. Importantly, for each 
locally optimal regulatory network, all parameters are determined once the physical constraints on 
the number of available molecules are specified. Although we are solving an over-simplified version 
of the problem facing real cells, we see parallels between the structure of these optimal solutions and 
the behavior of actual genetic regulatory networks. Subsequent papers will discuss more complete 
versions of the problem. 



I. INTRODUCTION 

Much of the everyday business of organisms involves 
the transmission and processing of information. On our 
human scale, the familiar examples involve the signals 
taken in through our sense organs [1]. On a cellular scale, 
information flows from receptors on the cell surface into 
the cell, modulating biochemical events and ultimately 
controlling gene expression [2]. In the course of devel- 
opment in multicellular organisms, individual cells ac- 
quire information about their location in the embryo by 
responding to particular "morphogen" molecules whose 
concentration varies along the main axes of the embryo 
[31 11]. In all these examples, information of interest to 
the organism ultimately is represented by events at the 
molecular level, whether the molecules are transcription 
factors regulating gene expression or ion channels control- 
ling electrical signals in the brain. This representation is 
limited by fundamental physical principles: individual 
molecular events are stochastic, so that with any finite 
number of molecules there is a limit to the precision with 
which small signals can be discriminated reliably, and 
there is a limit to the overall dynamic range of the sig- 
nals. Our goal in this paper (and its sequel) is to explore 
these limits to information transmission in the context of 
small genetic control circuits. 

The outputs of genetic control circuits are protein 
molecules that are synthesized by the cell from messen- 
ger RNA (mRNA), which in turn is transcribed from the 
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DNA template. The inputs often are protein molecules 
as well, "transcription factors" that bind to the DNA 
and regulate the synthesis of the mRNA. In the last 
decade, a number of experiments have mapped the in- 
put/output relations of these regulatory elements, and 
characterized their noise, that is the fluctuations in the 
output protein concentration when the inputs are held 
fixed HllZllliaiinjIIIllIllIllIllISllinillZl. in par- 
allel, a number of theoretical papers have tried to under- 
stand the origins of this noise, which ultimately refiects 
the random behavior of individual molecules along the 
path from input to output — the arrival of transcription 
factors at the their targets along the DNA, the initiation 
of transcription and the degradation of mRNA, the initi- 
ation of protein synthesis and the degradation of the out- 
put proteins [181 [III 1101 llll IMl IISI IMl IIS I2SI IIE HSl HH] ■ 
While open questions remain, it seems fair to say that 
we have a physical picture of the noise in genetic con- 
trol that we can use to ask questions about the overall 
function and design of these systems. 

The ability of any system to transmit information is 
determined not just by input/output relations and noise 
levels, but also by the distribution of inputs; maximal in- 
formation transmission requires a matching between the 
intrinsic properties of the system and the input statistics 
[30l [31] . In the context of sensory information process- 
ing, these matching conditions have been explored almost 
since the inception of information theory [351 [331 Ell [Si] . 
In particular, because the distribution of sensory inputs 
varies with time, optimal information transmission re- 
quires that the input/output relation track or adapt to 
these variations, and this theoretical prediction has led 
to a much richer view of adaptation in the neural code 
[36l [37l [38l l39l l40] . There are analogous matching con- 
ditions for genetic regulatory elements, and these condi- 
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tions provide parameter free predictions about the be- 
havior of the system, based on the idea that cells are 
trying to transmit the maximum amount of information 
[41j . Comparison with recent experiments has been en- 
couraging [42J. 

In this paper we go beyond the matching conditions 
to ask how cells can adjust the input/output relations of 
genetic regulatory elements so as to maximize the infor- 
mation that is transmitted through these systems. Ab- 
sent any constraints, the answer will always be to make 
more molecules, since this reduces the effective noise 
level, so we consider the problem of maximizing informa- 
tion transmission with a fixed mean or maximum number 
of molecules at both the input and the output. In this 
sense we are asking how cells can extract the maximum 
control power, measured in bits, from a given number 
of molecules, thus optimizing functionality under clear 
physical constraints. In general this problem is very dif- 
ficult, so we start here with the simplest case of a sin- 
gle input transcription factor that controls (potentially) 
many genes, but there is no interaction among these out- 
puts. Further, we focus on a limit (small noise) where 
some analytic progress is possible. We will see that, even 
in this case, the optimal solutions have an interesting 
structure, which emerges as a result of the interplay be- 
tween noise sources at the input and the output of the 
regulatory elements. For other approaches to the opti- 
mization of information transmission in biochemical and 
genetic networks, sec Refs [43 ] l44l l45j . 

Optimization of information transmission is a concise, 
abstract principle, grounded in the physics of the molec- 
ular interactions that underlie biological function. It 
would be attractive if we could derive the behavior of bi- 
ological systems from such a principle, rather than taking 
the myriad parameters of these systems simply as quan- 
tities that must be fit to data. It is not at all clear, 
however, that such a general principle should apply to 
real biological systems. Indeed, it is possible that solu- 
tions to our optimization problem are far from plausible 
in comparison with what we find in real cells. Thus, our 
most important result is that the parameters which we 
derive are reasonable in relation to experiment. While a 
realistic comparison requires us to solve the optimization 
problem in a fully interacting system, even in the sim- 
pler problem discussed here we can see the outlines of a 
theory for real genetic networks. Subsequent papers will 
address the full, interacting version of the problem. 

II. FORMULATING THE PROBLEM 

A gene regulatory element translates the concentration 
of input molecules T into output molecules O. We would 



like to measure, quantitatively, how effectively changes in 
the input serve to control the output. If we make many 
observations on the state of the cell, we will see that 
inputs and outputs are drawn from a joint distribution 
p{T,0), and our measure of control power should be a 
functional of this distribution. In his classic work. Shan- 
non showed that there is only one such measure of control 
power which obeys certain plausible constraints, and this 
is the mutual information between T and O (301 US] . 



To be concrete, we consider a set of genes, i = 
1, 2, • • • , Af, that all are controlled by a single transcrip- 
tion factor. Let the concentration of the transcription 
factor be c and let the levels of protein expressed from 
each gene be gi; below we discuss the units and normal- 
ization of these quantities. Thus, the input 2 = c and 
the output O = {gi}. In principle these quantities all 
depend on time. We choose to focus here on the steady 
state problem, where we assume that the output expres- 
sion levels reach their equilibrium values before the input 
transcription factor concentrations change. 



We view the steady state approximation not necessar- 
ily as an accurate model of the dynamics in real cells, but 
as a useful starting point, and already the steady state 
problem has a rich structure. In particular, as we will 
see, in this limit we have analytic control over the role 
of nonlinearities in the input/output relation describing 
the function of the different regulatory elements in our 
network. In contrast, most approaches to information 
transmission by dynamic signals are limited to the regime 
of linear response; see, for example, Ref |45]. Although 
we are focused here on information transmission in ge- 
netic circuits, it is interesting that the same dichotomy — 
nonlinear analyses of static networks and dynamic anal- 
yses of linear networks — also exists in the literature on 
information transmission in neural networks [ 34[ I35j . 



To specify the joint distribution of inputs and outputs, 
it is convenient to think that the transcription factor con- 
centration is being chosen out of a probability distribu- 
tion Ptf{c), and then the target genes respond with ex- 
pression levels chosen out of the conditional distribution 
P({gi}\c). In general, the mutual information between 
the set of expression levels {^i} and the input c is given 
by [301131] 



n{5i};c) = Jdc J d''gP{c,{g,}) log. 



P{c,{9i}) ' 
PTF{c)P{{gi})_ 



bits, 



(1) 
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where the overall distribution of expression levels is given 

by 



P{{9i}) = I dcPTF{c)P{{gi}\c). 



(2) 



Shannon's uniqueness theorem of course leaves open a 
choice of units, and here we make the conventional choice 
of bits, hence the logarithm is base two. 

We will approach the problem of optimizing informa- 
tion transmission in two steps. First, we will adjust 
the distribution Ptf{c) to take best advantage of the 
input/output relations, and then we will adjust the in- 
put/output relations themselves. Even the first step is 
difficult in general, so we start by focusing on the limit 
in which noise is small. 



A. Information in the small noise limit 

As noted in the Introduction, we will confine our atten- 
tion in this paper to the case where each gene responds 
independently to its inputs, and there are no interac- 
tions among the output genes; we point toward general- 
izations in the Discussion below, and return to the more 
general problem in subsequent papers. The absence of in- 
teractions means that the conditional distribution of ex- 
pression levels must factorize, P{{gi}\c) — Ylf^i Pi{gi\c) . 
Further, we assume that the noise in expression levels is 
Gaussian. Then we have |13 



P({5i}|c) = exp 



M 



M 

ln(27r)--^ln(af(c)) 



M 



(5i -5i(c))' 



(3) 



The input/output relation of each gene is defined by the 
mean gi{c), while af measures the variance of the fluctu- 
ations or noise in the expression levels at fixed input. 



af{c)^{ig,^Uc)r)- 



(4) 



In the limit that the noise levels Ci are small, we 
can develop a systematic expansion of the information 
/({^i}; c), generalizing the approach of Refs [111135]. The 

key idea is that, in the small noise limit, observation of 
the output expression levels {^i} should be sufficient to 
determine the input concentration c with relatively high 
accuracy; further, we expect that errors in this estima- 
tion process would be well approximated as Gaussian. 
Formally, this means that we should have 



v/2^^?(ta}) 



exp 



ic~c*i{g,})r 
2^?({5i}) 



(5) 



where c*{{gi}) is the most likely value of c given the out- 
puts, and cr^({3i}) is the variance of the true value around 
this estimate. We can use this expression to calculate 
the information by writing /({gi}; c) as the difference be- 
tween two entropies: 



/({gi};c) = - J dc Ptf{c) log, Ptf{c)^ J d^'gP{{gi}) ~ J dcP{c\{g,}) log, P{c\{g,}) 
= - J dcPTF{c)log,PTF{c)^\ J d''gP{{g,}) log, [2nea',{{gi})] . 



(6) 
(7) 



Intuitively, the first term is the entropy of inputs, which 
sets an absolute maximum on the amount of information 
that can be transmitted |48j : the second term is (minus) 
the entropy of the input given the output, or the "equiv- 
ocation" [50] that results from noise in the mapping from 
inputs to outputs. To complete the calculation we need 
an expression for this effective noise level a^- 

Using Bayes' rule, we have 



P{{gmPTF{c) 



P{{9i}) 



2({5i}) 



exp [-F(c, {gi})] 



(8) 
(9) 



where 



M 

F{c,{g,}) = -lnPTF{c) + -Y.H<jUc)) 



+ 2E^(5-Mc)f. (10) 

i=l ' ^ ^ 

Now it is clear that c*{{gi}) and o'c({gi}) are defined by 
dF{c,{g,}) 



= 



dc 



c=c-({3,}) 



1 



d'F{c,{gi}) 
(9c2 



(11) 



(12) 
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The leading term at small cri is then given by 
hi 



1 



E 

i=l 



1 



or 



%(c) 
dc 



(13) 



Finally, we note that, in the small noise limit, averages 
over all the expression levels can be approximated by an 
integral along the trajectory of mean expression levels. 



with an appropriate Jacobian. More precisely, 



/ d''gP{{g,}) [•••]«/ dcPTF{c)\{5{g,-Uc))) [■■■]■ 

•' •' i=l 

(14) 

Putting all these terms together, we have 



^({ffi};c) = 



- y dc Ptf (c) log2 Ptf (c) + ^ j dc Ptf (c) loga 



M 



d9\{c) 

2-Kc ^ oHc) \ dc 

1 — 1 ^ 



(15) 



The small noise approximation is not just a theorist's 
convenience. A variety of experiments show that fluctua- 
tions in gene expression level can be 10 — 25% of the mean 
[51 ini Uni ESI mi IIZI- As noted above, maximizing infor- 
mation transmission requires matching the distribution 
of input signals to the structure of the input /output rela- 
tions and noise, and in applying these conditions to a real 
regulatory element in the fruit fly embryo it was shown 
that the (analytically accessible) small noise approxima- 
tion gives results which are in semi-quantitative agree- 
ment with the (numerical) exact solutions |42|. Thus, 
although it would be interesting to explore the quanti- 
tative deviations from the small noise limit, we believe 
that this approximation is a good guide to the structure 
of the full problem. 

To proceed, Eq (151 for the information in the small 



gene i, we write 



noise limit instructs us to compute the mean response, 
gi(c) and the noise, cri(c), for every regulated gene. Since 
the properties of noise in gene expression determine to a 
large extent the structure of optimal solutions, we present 
in Sec |IIB a detailed description of these noise sources. 
In Sec IIC| we then introduce the 'cost of coding,' mea- 
sured by the number of signaling molecules that the cell 
has to pay to transmit the information reliably. Finally, 
we look for optimal solutions in Sec |III[ 



B. Input /output relations and noise 

Transcription factors act by binding to DNA near the 
point at which the "reading" of a gene begins, and either 
enhancing or inhibiting the process of transcription into 
mRNA. In bacteria, a simple geometrical view of this 
process seems close to correct, and one can try to make 
a detailed model of the energies for binding of the tran- 
scription factor(s) and the interaction of these bound fac- 
tors with the transcriptional apparatus, RNA polymerase 
in particular [50l |5T] • For eukaryotes the physical picture 
is less clear, so we proceed phenomenologically. If bind- 
ing of the transcription factor activates the expression of 



5i(c) 



c'" 



(16) 



and similarly if the transcription factor represses expres- 
sion we write 



5i(c) 



if"' 



(17) 



These are smooth, monotonic functions that interpolate 
between roughly linear response (n = 1 and large K) and 



dg{c) 



dc 



+ 



diffusion 
noise 



counting 
statistics 




FIG. 1: Input proteins at concentration c act as transcription 
factors for the expression of output proteins, g. The diffusive 
noise in transcription factor concentration and the shot noise 
at the output both contribute to stochastic gene expression. 
The regulation process is described using a conditional prob- 
ability distribution of the output knowing the input, P{g\c), 
which can be modeled as a Gaussian process with a variance 
ag(c). In this paper we consider the case of multiple output 
genes {gi}, i = l,---,M, each of which is independently reg- 
ulated by the process illustrated here with the corresponding 
noise erf. 
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steep, switch-like behavior (n oo) at some threshold 
concentration (c = K). Such 'Hill functions' often are 
used to describe the cooperative binding of n molecules 
to their target sites [53], with F = —kBTlnK the free 
energy of binding per molecule, and this is a useful intu- 
ition even if it is not correct in detail. 

To complete our formulation of the problem we need to 
understand the noise or fluctuations in expression level 
at fixed inputs, as summarized by variances af. There 
are several contributions to the variance, which we can 
divide into two broad categories, as in Fig[l] 

The transcription of mRNA and its translation into 
protein can be thought of as the "output" side of the 
regulatory apparatus. Ultimately these processes are 
composed of individual molecular events, and so there 
should be shot noise from the inherent randomness of 
these events. This suggests that there will be an output 
noise variance proportional to the mean, uf cx §[. 

The arrival of transcription factor molecules can be 
thought of as the "input" side of the apparatus, and again 
there should be noise associated with the randomness in 
this arrival. This noise is equivalent to a fluctuation in 
the input concentration itself; the variance in concentra- 
tion should again be proportional to the mean, and the 
impact of this noise needs to be propagated through the 
input/output relation, so that af -^^ oc c{dgi/ dcY . 

Putting together the input and output noise, we have 



o'flc) = a5i(c) + be 



dgijc) 
dc 



(18) 



where a and b are constants. Comparing this intuitive 
estimate to more detailed calculations j2TJ [^S] allows us 
to interpret these constants. If is normalized so that its 
maximum value is one, then a — 1/A^max, where A^max is 
the maximum number of independent molecules that are 
made from gene i. If, for example, each mRNA molecule 
generates many proteins during its lifetime, then (if the 
synthesis of mRNA is limited by a single kinetic step) 
A'max is the maximum number of mRNAs, as discussed 
in Refs [501 HH |M1 • 

The shot noise in the arrival of transcription factors 
at their targets ultimately arises from diffusion of these 
molecules. Analysis of the coupling between diffusion 
and the events that occur at the binding site [211 
E5] shows that the total input noise has both a term cx 
c{dgi/dc)'^ and additional terms that can be made small 
by adjusting the parameters describing kinetics of steps 
that occur after the molecules arrive at their target; here 
we assume that Nature chooses parameters which make 
these non-fundamental noise sources negligible [HT]. In 
the remaining term, we have b ^ 1/(D£t), where D is the 
diffusion constant of the transcription factor, £ is the size 
of its target on the DNA, and r is the time over which 
signals are integrated in establishing the steady state. 

With the (semi-)microscopic interpretation of the pa- 



rameters, we can write 
1 



af{c) 



gi{c) + cco 



dgijc) 
dc 



where there is a natural scale of concentration. 



N„ 



Co 



DiT 



(19) 



(20) 



To get a rough feeling for this scale, we note that diffu- 
sion constants for proteins in the cytoplasm are ~ ^m^/s 
[TBI l56l l57l [58] , target sizes are measured in nm, and 
integration times are minutes or hundreds of seconds (al- 
though there are few direct measurements). The maxi- 
mum number of independent molecules depends on the 
character of the target genes. In many cases of inter- 
est, these are also transcription factors, in which case a 
number of experiments suggest that A^ax ~ 10 — 100 
[121 l22[ [29] . Putting these numbers together, we have 
Co ~ 10 - 100 /{fim)^ or ~ 15 - 150 nM, although this 
(obviously) is just an order of magnitude estimate. 

To summarize, two rather general forms of noise limit 
the information transmission in genetic regulatory net- 
works. Both combine additively and ultimately trace 
their origin to a finite (and possibly small) number of 
signaling molecules. The input noise is caused by a small 
concentration of transcription factor molecules, and its 
effect on the regulated gene is additionally modulated by 
the input-output relation. The output noise is caused by 
the small number of gene products, and this noise is sim- 
ply proportional to the mean. It is reasonable to believe 
that the strengths of these two noise sources, in appro- 
priate units, will be of comparable magnitude. Since the 



concentration 


scale 


system 


Ref 


55± lOnM 


midpoint 


A repressor in E coli 


m 


55 ± 3 nM 


maximum 


Bed in Drosophila embryo 


m 


5.3±0.7nM 


midpoint 


GAGA 


w. 


~ 5nM 


midpoint 


crp to lac site 


m 


~ 0.2 nM 


midpoint 


lac to ORl 


[mill] 


~ 3nM 


midpoint 


lac to OR2 


^52] 


~ UOnM 


midpoint 


lac to OR3 


[smsg 


22 ± 3 nM 


midpoint 


lac to ORl in vitro 


!53l 



TABLE I: Concentration scales for transcription factors. We 
collect absolute concentration measurements on transcription 
factors from several different systems, sometimes indicating 
the maximum observed concentration and in other cases the 
concentration that achieves half-maximal activation or re- 
pression (midpoint). Bed is the bicoid protein, a transcription 
factor involved in early embryonic pattern formation; GAGA 
is a transcription factor in Drosophila, crp is a transcription 
factor that acts on a wide range of metabolic genes in bac- 
teria; lac is the well studied operon that encodes proteins 
needed for lactose metabolism in E coli; lac is the transcrip- 
tion factor that represses expression of the lac operon; ORl-3 
are binding sites for the lac repressor. 
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organism has to pay a metabolic price to reduce either 
noise source, it would be wasting resources if it were to 
lower the strength of one source alone far below the lim- 
iting effect of the other. 



The information transmission with this optimal choice of 
Ptf{c) takes a simple form, 



/* = logs^i. 



(25) 



C. Constraining means or maxima 

To proceed, we need to decide how the problem of 
maximizing information transmission will be constrained. 
One possibility is that we fix the maximum number of 
molecules at the input and the output. The constraint 
on the output can be implemented by measuring the ex- 
pression levels in units such that the largest values of the 
mean expression levels tji are all equal to one |49j . On the 
input side, we restrict the range of c to be c S [0,c,nax]- 
With this normalization and limits on the c integrals, we 
can maximize I{{gi\; c) directly by varying the distribu- 
tion of inputs, adding only a Lagrange multiplier to fix 
the normalization of Ptf{c), 



5Ptf{c) 



/({5i};c)~A j dcPTFic) 



= 0. 



(21) 



As discussed in Ref [42], the solution to the variational 
1cm dcfi 



problem defined in Eq (21 1 is 
1 



1 



1 



1 



M 

E 



1 



dg-iic) 



1/2 



27re ^ cr?(c) \ dc 
i=i 1 ^ ^ ^ 

where the normalization constant Zi is given by 

1/2 



(22) 



(23) 



dc 



1 

27re 



1 



M 



dgijc) 
dc 



(24) 



The expression for Zi, and hence the optimal infor- 
mation transmission, has a simple geometric interpre- 
tation. As the concentration of the input transcription 
factor varies, the output moves, on average, along a tra- 
jectory in the M-dimensional space of expression levels; 
this trajectory is defined by {gi{c)}. Nearby points along 
this trajectory can't really be distinguished, because of 
noise; the information transmission should be related to 
the number of distinguishable points. If the noise level 
were the same everywhere, this count of distinguishable 
states would be just the length of the trajectory in units 
where the standard deviation of the output fluctuations, 
projected along the trajectory, is one. Since the noise 
isn't uniform, we should introduce the local noise level 
into our metric for measuring distances in the space of ex- 
pression levels, and this is exactly what we see in Eq (24 1. 



Thus, we can think of the optimal information transmis- 
sion as being determined by the length of the path in 
expression space that the network traces as the input 
concentration varies, where length is measured with a 
metric determined by the noise level. 

This information capacity still depends upon the in- 
put/output relations and the noise levels, so we have a 
second layer of optimization that we can perform. Before 
doing this, however, we consider another formulation of 
the constraints. 

As an alternative to fixing the maximum concentra- 
tion of input transcription factor molecules, we consider 
fixing the mean concentration. To do this, we introduce, 
as usual, a second Lagrange multiplier a, so that our 
optimization problem becomes 



CTj^ / \ -^({5i};c)-A [ dcPrFic) ~ a [ dcPTF{c)c 
6Ptf{c) [ J J 



= 0. 



(26) 



Notice that we can also think of this as maximizing in- 
formation transmission in the presence of some fixed cost 
per input molecule. 



Solving Eq (26) for the distribution of inputs, Ptf{c), 
we find 



P*pic) 



1 

Z2 



M 

E 



27re ^ C7f(c) 
i,j=i ' ^ ' 



dgijc) 
dc 



1/2 



(27) 



where 



/•OO 

Z2 — dc 
Jo 



1 1 

E 



27re 



i'b '^i (c) V dc 



dgi{c) 



1/2 



(28) 

As usual in such problems we need to adjust the Lagrange 
multipliers to match the constraints, which is equivalent 
to solving 



dlnZ2 
da 



(c). 



(29) 
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2.5 3 3.5 4 

cooperatvity n 



FIG. 2: (Color online) Information capacity for one (activa- 
tor) input and one output. The information is / = log2 Zi+A, 
with A independent of the parameters; the map shows Zi as 
computed from Eq (371, here with C = Cmax/co = 1. We see 
that there is a broad optimum with cooperativity riopt = 1.86 
and Kopt = 0.48co = 0.48cmax. 



the optimization problem loses all of its structure. Specif- 
ically, in these two limits we have 



Zi(co ^ oo) = 



D£t 
27re 



1/2 



dc 

V C 



ne 



1/2 



and 



Zi{co ^ 0) 



N„ 



27re 



1/2 



dc 



dg{c) 



dc 



2N„ 



1/2 



(32) 
(33) 

.(34) 
(35) 



In both cases, the magnitude of the information capacity 
becomes independent of the shape of the input /output re- 
lation g{c). Thus, the possibility that real input /output 
relations are determined by the optimization of informa- 
tion transmission depends on the scale cq being com- 
parable to the range of concentrations actually used in 
real cells. Although we have only a rough estimate of 
Co ~ 15 — 150 nM, Table |l] shows that this is the case. 



The optimal information transmission in this case is 

4* =log2Z2 + a(c). (30) 

One might think that, for symmetry's sake, we should 
consider a formulation in which the mean number of out- 
put molecules also is constrained. There is some sub- 
tlety to this, since if we know the input /output functions, 
{gi{c)}, and the distribution of inputs, Ptf{c), then the 
mean output levels are determined. Thus it is not obvi- 
ous that we have the freedom to adjust the mean output 
levels. We return to this point in Section [IlIC| 



A. Numerical results with Cmax 

To proceed, we choose cq as the unit of concentration, 
so that 



Zi 



N„ 



27re 



c 



Zi{K/ca,n;C) = dx 
Jo 



where C = Cmax/co and 



1/2 



Zi 



{dg{x)/dxf 



g{x) + x{dg{x)/dxy 



(36) 



1/2 



(37) 



III. ONE INPUT, ONE OUTPUT 

To get a feeling for the structure of our optimization 
problem, we consider the case where the transcription 
factor regulates the expression level of just one gene. If 
we constrain the maximum concentrations at the input 
and output, then the information capacity is set by / = 
log2 Zi [Eq ( |25|]; substituting our explicit expression for 
the noise [Eq ( 19 1] we have 



Zi 



dc 



iVmax {dg{c)/dcf 

27re g{c) + CQc{dg{c) / dcY 



1/2 



(31) 



The first point to note is that if the natural scale of 
concentration, cq, is either very large or very small, then 



(K/coY 



(38) 



in the case of an activator. It now is straightforward to 
explore, numerically, the function Zi. An example, with 
Cmax/co = 1, is showu in Fig[2j 

We see that, with c^ax = co, there is a well defined 
but broad optimum of the information transmission as 
a function of the parameters K and n describing the in- 
put/output relation. Maximum information transmission 
occurs at modest levels of cooperativity {n « 2) and with 
the midpoint of the input/output relation near the mid- 
point of the available dynamic range of input concentra- 
tions {K « Cmax/2). 

Optimal solutions for activators and repressors have 
qualitatively similar behaviors, with the optimal param- 
eters Kopt and riopt both increasing as Cmax increases 
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(Fig|3^). Interestingly, at the same value of Cmax, the 
optimal repressors make fuller use of the dynamic range 
of outputs. The information capacity itself, however, is 
almost identical for activators and repressors, across a 
wide range of Cmax (FigjsJ;). This is important, because 
it shows that our optimization problem, even in this sim- 
plest form, can have multiple nearly degenerate solutions. 
We also see that increases of Cmax far beyond cq produce 
a rapidly saturating information capacity, as expected 
from Eq (35 1. Therefore, although increasing the dy- 



namic range always results in an increase of capacity, the 
advantage in terms of information capacity gained by the 
cell being able to use input concentration regimes much 
larger than cq is quite small. 



B. Some analytic results 

Although the numerical results are straightforward, we 
would like to have some intuition about these optimal so- 
lutions from analytic approximations. Ou r ba sic problem 
is to do the integral defining Zi, in Eq ( [37| . We know 
that this integral becomes simple in the limit that C is 
either large or small, so let's start by trying to generate 
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FIG. 3: (Color online) The optimal solutions for one gene 
controlled by one transcription factor. The optimization of 
information transmission in the small noise limit depends 
on only one parameter, which we take here as the maxi- 
mum concentration of the input molecules, measured in units 
determined by the noise itself [co from Eq ( 20 1] . Panel A 
shows the optimal input/output relations with Cmax/co = 
0.3, 1, 3, 10, 30, 100, 300; activators shown in blue (solid 
line), repressors in green (dashed line). Although the in- 
put/output relation is defined for all c, we show here only the 
part of the dynamic range that is accessed when < c < Cmax. 
Panel B shows the optimal distributions, P^p{c), for each of 
these solutions. Panel C plots logj Zi for these optimal solu- 
tions as a function of Cmax/co. Up to an additive constant, 
this is the optimal information capacity, in bits. 



an approximation that will be valid at large C. 

At large C, the concentration of input molecules can 
become large, so we expect that the 'output noise,' oc 
g, will be dominant. This suggests that we write 



Zi = 



dx 



dx 



{d-g{x)/dxf 



9{x) 
dg{x) 



dx 



x{dg{x) /dxY 
1 



1/2 



1 ^ -a; , , 
2 g{x) 



dg{x) 



dx 



(39) 



To proceed, we note the combination dx\dg / dx\, which 
invites us to convert this into an integral over g. We use 
the fact that, for activators described by the Hill function 
in Eq (l38|. 



X 

dgjx) 
dx 

Substituting, we find 

rsic) 



K 

Co 



1-5 



l/n 



Zi 



V9 



1 - 



-5(1-5)- 



£2!Lol-l/"Cl _ o)2+l/n 

2K ^ ^ ^' 



(40) 
(41) 



(42) 



2K 



9(C) 



d55'/2-V"(i_ 5)2+1/" + ..., 



(43) 

Again, we are interested in large C, so we can approxi- 
mate g{C) « 1 — (if/cmax)"- Similarly, the second term 
in Eq ( |43[ ) can be approximated by letting the upper limit 
on the integral approach 1; the difference between g{C) 
and 1 generates higher order terms in powers of 1/C. 
Thus we have 



K 



A{n) 



cpn^ 
2K 



A{n) = / dzzi/2-i/»(i _ ^)2+iA 
r(3/2 - l/n)r(3 + l/n) 

" r(9/2) ■ 



(44) 
(45) 
(46) 



The approximate expression for Zi expresses the basic 
compromise involved in optimizing information transmis- 
sion. On the one hand, we would like K to be small so 
that the output runs through its full dynamic range; cor- 
respondingly we want to decrease the term (if/cmax)"- 
On the other hand, we want to move the most sensitive 
part of the input/output relation to higher concentra- 
tions, so that we are less sensitive to the input noise; this 
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corresponds to decreasing the term cx cq/K . The optimal 
compromise is reached at 



nA{n)co 



2Cn 



(47) 



Parallel arguments yield, for repressors, 



^1 



rep 



2-2 



"opt 



B{n) 



K 

Cmax 

nB{n)cQ 



-Bin) 



2K 



dzzl/2+l/"(l_^)2-l/" 



r(3/2 + l/n)r(3 - 1/n) 



r(9/2) 



(49) 
(50) 
(51) 



The first thing we notice about our approximate re- 
sults is that the optimal values of K are almost propor- 
tional to Cmax, as One might expect, but not quite — the 
growth of K with Cmax is slightly sublinear. Also, one 
might have expected that K would be chosen to divide 
the available dynamic range into roughly equal 'on' and 
'off' regions, which should maximize the entropy of the 
output and hence increase the capacity; to achieve this 
requires -R'opt/cmax ~ 1/2. In fact we see that the ra- 
tio Kopt/cmax is determined by a combination of terms, 
and depends in an essential way on the scale of the in- 
put noise cq, even though we assume that the maximal 
concentration is large compared with this scale. 

The basic compromise between extending the dynamic 
range of the outputs and avoiding low input concentra- 
tions works differently for activators and repressors. As 
a result, the optimal values of K are different in the two 
cases. From Eq (37 1, it is clear that the symmetry be- 



tween the two types of regulation is broken by the noise 
term proportional to g. Unless the optimal Hill coefficient 
for repressors were very much smaller than for activators 



(and it is not), Eqs (47) and (49 1 predict that K'J^^ will 



be smaller than K^^l, in agreement with the numerical 
results in Fig|3] 

To test these analytic approximations, we can compare 
the predicted values of K^pt with those found numeri- 
cally. There is a slight subtlety, since our analytic results 
for Kopt depend on the Hill coefficient n. We can take this 
coefficient as known from the numerical optimiza tion , or 
we can use the approximations to Zi [as in Eq (44 1 ] to 
simultaneously optimize for K and n. In contrast to the 
optimization of K, however, there is no simple formula 
for riopt, even in our approximation at large Cmax- 

Results for the approximate vs. numerically exact op- 
timal K are shown in Fig |4] As it should, the approx- 
imation approaches the exact answer as Cmax become 
large. In fact, the approximation is quite good even at 
Cmax/co ~ 10, and for activators the error in ifopt is 
only ^ 15% at c„iax/co ^ 3. Across the full range of 
Cmax/co > 1, the analytic approximation captures the 



0.45 

a 

0.4 
0.35 
0.3 

S 0.25 

E 

^ 0.2 
0.15 
0.1 
0.05 




-o— full optimization 
4. largo c 

* ^ max 




00000 15 SST>1i»** a i!>fr*ft»» S«»5 



- activator 

- repressor 



10 



10 

c /c„ 

max 



10 



2.5 



1.5 



0.5 




- activator 

- repressor 



10 

c lc„ 
max 



10 



FIG. 4: Approximate results for the optimal values of K (A) 
and n (B) compared with exact numerical results for activa- 
tors (black lines) and repressors (gray lines). As explained 
in the text, we can use our analytic approximations to deter- 
mine, for example, the optimal K assuming n is known (large 
Cmax with known n results), or we can simultaneously opti- 
mize both parameters (large Cmax results); results are shown 
for both calculations. 



basic trends: A'opt/cmax is a slowly decreasing function 
of Cmax/co, K^pt larger than /C^pP by roughly a fac- 
tor of 2, and for both activators and repressors we have 
Kopt noticeably smaller than Ci„ax/2. Similarly good re- 
sults are obtained for the approximate predictions of the 
optimal Hill coefficient n, as shown in Fig|4|3. 

As noted above, the large Cmax approximation makes 
clear that optimizing information transmission is a com- 
promise between using the full dynamic range of outputs 
and avoiding expression levels associated with large noise 
at low concentration of the input. The constraint of us- 
ing the full dynamic range pushes the optimal K down- 
ward; this constraint is stronger for repressors [compare 
the second terms of Eqs (47) & (49l], causing the optimal 
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Ks of repressors to be smaller than those of the activa- 
tors. On the other hand, avoiding input noise pushes the 
most sensitive part of the expression profile toward high 
concentrations, favoring large K. The fact that this ap- 
proximation captures the basic structure of the numeri- 
cal solution to the optimization problem encourages us to 
think that this intuitive compromise is the essence of the 
problem. It is also worth noting that as Cmax increases, 
activators increase their output range, hence gaining ca- 
pacity. On the other hand, the output of the repressed 
systems is small for large Cmax and the output noise thus 
is large, limiting the increase in capacity compared to the 
activated genes, as is seen in Fig [3]:. 

In the case of small Cmax it is harder to obtain detailed 
expressions for K, however we can still gain insight from 
the expression for the capacity in this limit. To obtain the 
large Cmax limit we assumed that g 3> x{dg/dxY in the 
denominator of the integrand which defines Zi ; to obtain 
the small Cmax limit we make the opposite assumption: 



Zi 



dx 



dx 



dx 



{dg{x)/dxf 



g{x) + x{dg{x)/dxy 



1/2 



1 



1 

l + g{x)/{x{dg{x)/dx)^) 
X 1 



1/2 



1 



2n2 g(l - g)2 



(52) 



where in the last step we use the relation in Eq (41 1. We 



see that, if g approaches one, the first correction term will 
diverge. This allows us to predict the essential feature of 
the optimal solutions at small Cmax, namely that they do 
not access the full dynamic range of outputs. 



C. Constraining means 

Here we would like to solve the same optimization 
problem by constraining the mean concentrations, rather 
than imposing a hard constraint on the maximal concen- 
trations; as noted above we can also think of this prob- 
lem as maximizing information subject to a fixed cost 
per molecule. To compare results in a meaningful way, 
we should know how the mean concentration varies as a 
function of Cmax when we solve the problem with con- 
strained maxima, and this is shown in Fig[5|V. An inter- 
esting feature of these results is that mean concentrations 
are much less than half of the maximal concentration. 
Also, the mean input concentrations for activator and 
repressor systems are similar, despite different values of 
the optimal K. This result shows that for a given dy- 
namic range defined by Cmax, there is an optimal mean 
input concentration, which is independent of whether the 
input/ouput relation is up or down regulating. 

Equation ( 28 1 shows us how to compute the partition 
function Z2 for the case where we constrain the mean 



concentration of transcription factors, and Eq (30) re- 
lates this to the information capacity 12- Substituting 



our explicit expressions for the noise in the case of one 
input and one output, we have 



Zo = 



27re 



1/2 



Zo 



dc 



{dg{c)/dcf 



-,1/2 



g{c) + cco{dg{c) / dcY 



(53) 
(54) 



As before, we choose Hill functions for g{c), and max- 
imize I2 with respect to the parameters K and n. This 
defines a family of optimal solutions parameterized by the 
Lagrange multiplier a, and we can tune this parameter 
to match the mean concentration (c). Using the calibra- 
tion in Fig|5j\, we can compare these results with those 
obtained by optimizing with a fixed maximum concen- 
tration. Results are shown in Fig|5]3-d. 

The most important conclusion from Fig|5]is that con- 
straining mean concentrations and constraining maximal 
concentrations give — for this simple problem of one in- 
put and one output — essentially the same answer. The 
values of the optimal Ks are almost identical (Fig|5p), 
as are the actual number of bits that can be transmitted 
(Fig[5|3). The only systematic difference is in the Hill co- 
efficient n, where having a fixed maximal concentration 
drives the optimization toward slightly larger values of n 
(Fig[5p), so that more of the dynamic range of outputs 
is accessed before the system runs up against the hard 
limit at c = Cmax- 

It is interesting that the optimal value of K is more 
nearly a linear function of (c) than of Cmax, as we see in 
Fig [5p. T o understand this, we follow the steps in Sec- 



tion 



IIIB expanding the expression for (c) in the same 



approximation that we used for large Cn 



/(f dc c 



(dg(c)/dcf 
g{c)+cco(dg(c) / dcY 



1/2 



(d-g(c)/dc)^ 
g(c)+cco(dg(c) / dc) 



1/2 



\f^l^U-gn^-gHl~-gf 



Umd-gn^c-gHl--gr 



(55) 



In the case of an activator, c 
and the leading terms become: 



K/co(g/(l-5))'/", 



(c) 



-1/n 



K 



n^S;[^^d-g-gh-Hl--g) 



2+i 



2 



rs(C) 
-9? 



(56) 



To get some intuition for the numerical values of these 
terms we will assume the integral covers the whole expres- 
sion range g £ [0,1], and n = 3. Then this expression 
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FIG. 5: A: Mean concentration of the transcription factor when we optimize information transmission subject to a constraint 
on the maximum concentration. Results are shown for one input and one output, both for activators and repressors. The 
dashed black line shows equality. B-D: Comparing two formulations of the optimization problem for activators (black lines) 
and repressors (gray lines) calculated with a finite dynamic range (cmax - circles and solid lines) and constrained means (crosses 
and dashed lines). The panels show the relative information in panel B, the optimal value of K in panel C, the optimal value of 
the Hill coefficient in panel D. In panel C, approximate results for K are shown as a function of (c), from Eq's (561 and (58 1. 



simplifies to 



0.86X + 0.52, 



(57) 



so we understand how this simple result emerges, at least 
asymptotically at large Cmax- 

In the case of repressors the leading terms are: 



1/n 



K 



g(C) 

9(0) ■ 



(58) 



'9(0) 



As in the case of the activator, making the rough approx- 
imation that n = 3 and g € [0, 1] allow us to get some 
intuition for this large Cmax result: 



2.8is:+ 1.19, 



(59) 



These extremely crude estimates do predict the basic 
linear trends in Fig [5p, including the fact that for a 
given value of the mean concentration, the repressor has 
a smaller K than the activator. 

Before leaving this section, we should return to the 
question of constraining mean outputs, as well as mean 
inputs. We have measured the input concentration in ab- 
solute units (or relative to the physical scale cq), so when 
we constrain the mean input we really are asking that the 
system use only a fixed mean number of molecules. In 
contrast, we have measured outputs in relative units, so 
that the maximum of g(c) is one. If we want to constrain 
the mean number of output molecules, we need to fix not 
{g) , but rather A'max (ff) , since the factor of iVmax brings 
us back to counting the molecules in absolute terms [51] . 
Thus, exploring constrained mean output requires us to 
view iVjuax (and hence the scale Cq) as an extra adjustable 
parameter. 

By itself, adding iVmax as an additional optimization 
parameter makes our simple problem more complicated, 
but does not seem to add much insight. In principle it 
would allow us to discuss the relative information gain 
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on adding extra input vs output molecules, with the idea 
that we might find optimal information transmission sub- 
ject to some net resource constraint; for initial results in 
this direction see Ref In networks with feedback, the 
target genes also act as transcription factors, and these 
tradeofi^s should be more interesting. We will return to 
this problem in subsequent papers. 



and we constrain the maximal concentrations, the general 
form of the information capacity in the small noise limit 



is given by Eq ( 24 1 



IV. MULTIPLE OUTPUTS 



When the single transcription factor at the input of 
our model system has multiple independent target genes, 



^1 



dc 



27re 



M ^ 
i=i 1 ^ ^ 



dc 



1/2 



27re 



1/2 



dc 



M 



{dg\{c)/dc) 



1/2 



V- 

^5i(c) + cco(dgi(c)/dc)2 



(60) 



where we assume for simplicity that the basic parameters 
^max and D£t are the same for all the target genes Once 
again, cq = N^^a.^/ D£t provides a natural unit of concen- 
tration. We limit ourselves to an extended discussion of 
the case with a hard upper bound, Cmax, to the dynamic 
range of the input. As in the case of a single output, the 
calculation with a constrained mean input concentration 
gives essentially the same results. 

To get an initial feeling for the structure of the prob- 
lem, we try the case of five target genes, all of which are 
activated by the transcription factor. Then 



ffi(c) 



(61) 



and we can search numerically for the optimal settings 
of all the parameters {Ki,ni}. Results are shown in Fig 
[6] A striking feature of the problem is that, for small 
values of the maximal concentration C = Cmax/co, the 
optimal solution is actually to have all five target genes 
be completely redundant , with identical values of Ki and 
rii. As Cniax increases, this redundancy is lifted, and the 
optimal solution becomes a sequence of target genes with 
staggered activation curves, in effect 'tiling' the input do- 
main < c < Cmax- To interpret these results, we real- 
ize that for small maximal concentration the input noise 
dominates and the optimal strategy for M genes is to 
'replicate' one well-placed gene M-times: having M in- 
dependent and redundant readouts (with identical K and 
n) of the input concentration will decrease the noise by a 
factor of \/M. However, as the dynamic range increases 
and output noise has a chance to compete with the in- 
put noise, more information can be transmitted by using 



M genes to probe the input at different concentrations, 
thereby creating a cascade of genes that get activated at 
successively higher and higher input levels. The transi- 
tion between these two readout strategies is described in 
more detail below. 

To look more closely at the structure of the problem, 
we drop down to consider two target genes. Then there 
are three possibilities — two activators (AA), two repres- 
sors (RR), and one of each (AR). For each of these dis- 
crete choices, we have to optimize two exponents (ni, 712) 
and two half-maximal points {Ki, K2)- In Fig|7]we show 
how Zi varies in the (ifi, K2) plane, assuming that at ev- 
ery point we choose the optimum exponents; the different 
quadrants correspond to the different discrete choices of 
activator and repressor. The results show clearly how 
the redundant {Ki — K2) solutions at low values of Cmax 
bifurcate into asymmetric {Ki ^ K2) solutions at larger 



values of Cn 



the critical value of Cmax is different for 



activators and repressors. This bifurcation structure is 
summarized in Fig |8] where we also see that, for each 
value of c,jiax, the three different kinds of solutions (AA, 
RR and AR) achieve information capacities that differ 
by less than 0.1 bits. 

The information capacity is an integral of the square 
root of a sum of terms, one for each target gene [Eq ( 60 1] . 



Thus if we add redundant copies of a single gene, all with 
the same values of K and n, the integral Zi will scale as 
•\/M, where M is the number of genes. In particular, 
as we go from 1 to 2 target genes, Z would increase by 
a factor -\/2 and hence the information capacity, log2 Z, 
would increase by one half bit; more generally, with M 
redundant copies, we have (1/2) log2 M bits of extra in- 
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formation relative to having just one gene. On the other 
hand, if we could arrange for two target genes to make 
non-overlapping contributions to the integral, then two 
genes could have a value of Z that is twice as large as for 
one gene, generating an extra bit rather than an extra 
half bit. In fact a full factor of two increase in Z isn't 
achievable, because once the two target genes are sam- 
pling different regions of concentration they are making 
different tradeoffs between the input and output noise 
terms; since the one gene had optimized this tradeoff, 
bifurcating into two distinguishable targets necessarily 
reduces the contribution from each target. Indeed, if the 
maximal concentration is too low then there is no 'space' 
along the c axis to fit two distinct activation (or repres- 
sion) curves, and this is why low values of Cmax favor the 



a 1 

0.8 
0.6 
0.4 
0.2 




0.01 





b 


1 






0.8 






0.6 












0.4 






0.2 




0.3 

C/C- 



10 




3 10 





/ / 
P / 



10" 



c /c„ 

max 



10' 



FIG. 6: (Color online) Optimal input/output relations for the 
case of five independent target genes, activated by the TF at 
concentration c. Successive panels (A-E) correspond to differ- 
ent values of the maximal input concentration, as indicated 
(C = 0.3, 1, 3, 5, 10). Panel F summarizes the optimal values 
of the Tfi as a function of C = Cmax/co: as C is increased, the 
Ki of the fully redundant input/output relations for C = 0.3 
bifurcate such that at C = 10 the genes tile the whole input 
range. 



redundant solutions. 

Figure |9]i shows explicitly that when we increase the 
number of target genes at low values of Cmax, the optimal 
solution is to use the genes redundantly and hence the 
gain in information is (l/2)log2M. At larger values of 
Cmax, going from one target to two targets one can gain 
more than half a bit, but this gain is bounded by one bit, 
and indeed over the range of Cmax that we explore here 
the full bit is never quite reached. 

We can take a different slice through the parameter 
space of the problem by holding the number of target 
genes fixed and varying Cmax- With a single target gene, 
we have seen (Fig|3| that the information capacity sat- 
urates rapidly as Cmax is increased above Cq. We might 
expect that, with multiple target genes, it is possible to 
make better use of the increased dynamic range, and this 
is what we see in Fig[9]3. 

For a system with many target genes, it is illustra- 
tive to plot the optimal distribution of input levels, 

-He). 



p. 



j^p{c) oc (T-^(c). Figure 10 shows the results for the 
case of M = 2, 3, • • • , 9 genes at low (C = 0.3) and high 
(C — 30) input dynamic range. At low input dynamic 
range the distributions for various M collapse onto each 
other (because the genes are redundant), while at high C 
increasing the number of genes drives the optimal distri- 
bution closer to cx c^^/^. We recall that the input noise 
is (Tc (X ^/c, so this shows that, as the number of targets 
becomes large, the input noise becomes dominant over a 
wider and wider dynamic range. 

Finally, one can ask how finely tuned the input /output 
relations for the particular genes need to be in a maxi- 
mally informative system. To consider how the capac- 
ity of the system changes when the parameters of the 
input/output relations change slightly, we analyzed the 
(Hessian) matrix of second derivatives of the information 
with respect to fractional changes in the various param- 
eters; we also made more explicit maps of the variations 
of information with respect to the individual parameters, 
and sampled the variations in information that result 
from random variations of the parameters within some 
range. Results for a two gene system are illustrated in 
Fig [11] 

The first point concerns the scale of the variations — 
20% changes in parameters away from the optimum re- 
sult in only ~ 0.01 bits of information loss, and this is 
true both at low Cmax where the solutions are redundant 
and at high Cmax where they are not. Interestingly, the 
eigenmodes of the Hessian reveal that in the asymmet- 
ric case the capacity is most sensitive to variations in 
the larger K. The second most sensitive (much weaker 
than the first) direction is a linear combination of both 
of the parameters K and n for the gene which is acti- 
vated at lower concentrations. Perhaps surprisingly, this 
means that genes which activate at higher K need to have 
their input/output relations positioned with greater ac- 
curacy along the c axis, even in fractional terms. If we 
think of K ^ e'"^/^'^'^ , where F is the binding (free) en- 
ergy of the transcription factor to its specific target site 
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FIG. 7: (Color online) The case of two target genes. The maps show contour plots of relative information (logj Zi), as a function 
of the K values of the two genes: Ki and K2- In each map, the upper right quadrant (A- A) contains solutions where both 
genes are activated by a common TF, in the lower left quadrant (R-R) both genes are repressed, and the other two quadrants 
(A-R) contain an activator-repressor mix. The maximal concentration of the input molecules is fixed at Cmax/co = 0.1 in map 
A, at 0.5 in map B, and at 1 in map C. We see that, for example, only at the highest value of Cmax does the two activator 
solution in the upper right quadrant correspond to distinct values of Ki and K2; at lower values of Cmax the optimum is along 
the 'redundant' line Ki = K2- The redundancy is lifted at lower values of Cmax in the case of repressors, as we see in the lower 
left quadrants, and the mixed activator/repressor solutions are always asymmetric. At large Cmax we also see that there are 
two distinct mixed solutions. 
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FIG. 8: The relative information for stable solutions for two 
genes as a function of Cmax (panel A). The inset shows the dif- 
ference in information transmission for 2 activators and the 
mixed case, relative to the two repressors. In panel B, the 
optimal Ki and K2 are plotted as a function of Cmax for two 
activators (squares) and two repressors (circles). The bifur- 
cation in if is a continuous transition that happens at lower 
Cmax in the case of two repressors. 
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FIG. 9: The relative information for different values of c^ax 
as a function of the number of genes, M, shown in panel A. At 
low Cmax the geues are redundant and so the capacity grows as 
(1/2) logj M; at high Cmax, the increase in capacity is larger, 
but bounded from above by one bit. The differences in infor- 
mation for various combinations of activators and repressors 
are comparable to the size of the plot symbols. In panel B, 
the relative information for different numbers of genes as a 
function of Cmax. At higher M, the system can make better 
use of the input dynamic range. 



along the genome, another way of stating this result is 
that weaker binding energies (smaller F) must be speci- 
fied with greater precision to achieve a criterion level of 
performance. Finally, if we allow parameters to vary at 
random, we see (Fig [TTp & D) that the effects on in- 
formation capacity are extremely small as long as these 
variations are bounded, so that the range of the natu- 
ral log of the parameters is significantly less than one. 
If we allow larger fiuctuations, there is a transition to a 
much broader distribution of information capacities, with 
a substantial loss relative to the optimum. 



V. DISCUSSION 

The ability of cells to control the expression levels of 
their genes is central to growth, development and sur- 
vival. In this work we have explored perhaps the sim- 
plest model for this control process, in which changes 
in the concentration of a single transcription factor pro- 
tein modulate the expression of one or more genes by 
binding to specific sites along the DNA. Such models 
have many parameters, notably the binding energies of 
the transcription factor to the different target sites and 
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FIG. 10: (Color online) The optimal probability distribution 
of inputs, P^p{c). In red (dotted line), plotted for C = 0.3. 
In blue (solid line), plotted for C = 30. Different lines corre- 
spond to solutions with 2, 3, • • • , 9 genes. At low C (red dotted 
line), the genes are degenerate and the input distribution is 
independent of the number of genes. At high C (blue solid 
line), where the genes tile the concentration range, the opti- 
mal input distribution approaches (c/co)"^''^ (dashed line) as 
the number of target genes increases. 



the interactions or cooperativity among factors bound to 
nearby sites that contribute to the control of the same 
gene. This rapid descent from relatively simple physical 
pictures into highly parameterized models is common to 
most modern attempts at quantitative analysis of bio- 
logical systems. Our goal in this work is to understand 
whether these many parameters can be determined by 
appeal to theoretical principles, rather than solely by fit- 
ting to data. 

We begin our discussion with a caveat. Evidently, de- 
riving the many parameters that describe a complex bio- 
logical system is an ambitious goal, and what we present 
here is at best a first step. By confining ourselves to 
systems in which one transcription factor modulates the 
expression of many genes, with no further inputs or in- 
teractions, we almost certainly exclude the possibility of 
direct, quantitative comparisons with real genetic control 
networks. Understanding this simpler problem, however, 
is a prerequisite to analysis of more complex systems, 
and, as we argue here, sufficient to test the plausibility 
of our theoretical approach. 

The theoretical principle to which we appeal is the op- 
timization of information transmission. In the context 
of genetic control systems, we can think of information 
transmission as a measure of control power — if the sys- 
tem can transmit /bits of information, then adjustment 
of the inputs allows the cell to access, reliably, 2^ dis- 



tinguishable states of gene expression. In unicellular or- 
ganisms, for example, these different states could be used 
to match cellular metabolism to the available nutrients, 
while in the developing embryo of a multicellular organ- 
ism these different states could be the triggers for emer- 
gence of different cell types or spatial structures; in either 
case, it is clear that information transmission quantifies 
our intuition about the control power or (colloquially) 
complexity that the system can achieve. Although one 
could imagine different measures, specialized to differ- 
ent situations, we know from Shannon that the mutual 
information is the unique measure that satisfies certain 
plausible conditions, and works in all situations [501 15T] . 

Information transmission is limited by noise. In the 
context of genetic control systems, noise is significant 
because the number of molecules involved in the con- 
trol process is small, and basic physical principles dic- 
tate the random behavior of the individual molecules. In 
this sense, the maximization of information transmission 
really is the principle that organisms should extract max- 
imum control power from a limited number of molecules. 
Analysis of experiments on real control elements suggests 
that the actual number of molecules used by these sys- 
tems sets a limit of 1 — 3 bits on the capacity of a tran- 
scription factor to control the expression level of one gene, 
that significant increases in this capacity would require 
enormous increases in the number of molecules, and that, 
at least in one case, the system can achieve '-^ 90% of its 
capacity [41j |42]. Although these observations are lim- 
ited in scope, they suggest that cells may need to oper- 
ate close to the informational limits set by the number of 
molecules that they can afford to devote to these genetic 
control processes. 

The strategy needed to optimize information transmis- 
sion depends on the structure of the noise in the system. 
In the case of transcriptional control, there are two irre- 
ducible noise sources, the random arrival of transcription 
factors at their target sites and the shot noise in the syn- 
thesis and degradation of the output molecules (mRNA 
or protein). The interplay between these noise sources 
sets a characteristic scale for the concentration of tran- 
scription factors, Co ^ 15 — 150 nM. If the maximum 
available concentration is too much larger or smaller than 
this scale, then the optimization of information transmis- 
sion becomes degenerate, and we lose predictive power. 
Further, cq sets the scale for diminishing returns, such 
that increases in concentration far beyond this scale con- 
tribute progressively smaller amounts of added informa- 
tion capacity. Thus, with any reasonable cost for produc- 
ing the transcription factor proteins, the optimal trade- 
off between bits and cost will set the mean or maximal 
concentration of transcription factors in the range of cq. 
Although only a very rough prediction, it follows without 
detailed calculation, and it is correct (Table |l]) . 

The optimization of information transmission is largely 
a competition between the desire to use the full dynamic 
range of outputs and the preference for outputs that can 
be generated reproducibly, that is, at low noise. Because 
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FIG. 11: (Color online) Parameter variations away from the optimum. Results are shown for a two gene system, focusing on 
the solution with two activators. (A) Analysis of the Hessian matrix for Cmax/co = 0.3, where the two genes are redundant. 
Top four panels show the variation in information {51 in bits) along each dimension of the parameter space (thick red line) and 
the quadratic approximation. (B) As in (A), but with Cmax/co = 10, where the optimal solution is non-redundnant. We also 
show the eigenvectors and eigenvalues of the Hessian matrix. (C) Distribution of information loss AI when the parameters Kj 
and K2 are chosen at random from a uniform distribution in InK, with widths as shown; here Cmax/co ~ 10. (D) As in (C), 
but for variations in the Hill coefficients ni and n2. 



of the combination of noise sources, this competition has 
non-trivial consequences, even for a single transcription 
factor controlling one gene. As we consider the control 
of multiple genes, the structure of the solutions becomes 
richer. Activators and repressors are both possible, and 
can achieve nearly identical information capacities. With 
multiple target genes, all the combinations of activators 
and repressors also are possible [6(1. This suggests that, 
generically, there will be exponentially many networks 
that are local optima, with nearly identical capacities, 
making it possible for a theory based on optimization to 
generate diversity. 

For a limited range of input transcription factor con- 
centrations, the solutions which optimize information 
transmission involve multiple redundant target genes. 



Absent this result, the observation of redundant targets 
in real systems would be interpreted as an obvious sign 
of non-optimality, a remnant of evolutionary history, or 
perhaps insurance against some rare catastrophic failure 
of one component. As the available range of transcrip- 
tion factor concentrations becomes larger, optimal solu- 
tions diversify, with the responses of the multiple target 
genes tiling the dynamic range of inputs. In these tiling 
solutions, targets that require higher concentrations to 
be activated or repressed also are predicted to exhibit 
greater cooperativity; in such an optimized system one 
thus should find some genes controlled by a small number 
of strong binding sites for the transcription factor, and 
other genes with a large number of weaker sites. 

To a large extent, the basic structure of the (numer- 
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ically) optimal solutions can be recovered analytically 
through various approximation schemes. These analytic 
approximations make clear that the optimization really is 
driven by a conflict between using the full dynamic range 
of outputs and avoiding states with high intrinsic noise. 
In particular, this means that simple intuitions based on 
maximizing the entropy of output states, which are cor- 
rect when the noise is unstructured [34^ , fail. Thus, al- 
most all solutions have the property that at least one 
target gene is not driven through the full dynamic range 
of its outputs, and even with one gene the midpoint of 
the optimal activation curve can be far from the mid- 
point of the available range of inputs. The interplay 
between different noise sources also breaks the symme- 
try between activators and repressors, so that repressors 
optimize their information transmission by using only a 
small fraction of the available input range. 

The predictive power of our approach depends on the 
existence of well defined optima. At the same time, it 
would be difficult to imagine evolution tuning the pa- 
rameters of these models with extreme precision, so the 
optima should not be too sharply defined. Indeed, we find 
that optima are clear but broad. In the case of multiple 
genes, random ~ 25% variations in parameters around 
their optima result in only tiny fractions of a bit of in- 
formation loss, but once fiuctuations become larger than 
this the information drops precipitously. Looking more 
closely, we find that proper placement of the activation 
curves at the upper end of the input range is more criti- 
cal, implying that it is actually the weaker binding sites 
whose energies need to be adjusted more carefully (per- 
haps contrary to intuition). 

With modest numbers of genes, the optimization ap- 
proach we propose here has the promise of making rather 
detailed predictions about structure of the input/output 
relations, generating what we might think of as a spec- 
trum of Ks and ns. In the limit of larger networks, we 
might expect this spectrum to have some universal prop- 
erties, and we see hints of this in Fig [lO] Here, as we 
add more and more target genes, the optimal distribu- 
tion of inputs approaches an asymptote Ptf{c) oc 1/-\/c; 
more of this limiting behavior is accessible if the available 
dynamic range of inputs is larger. This is the form we ex- 
pect if the effective noise is dominated by the input noise, 
(Tc oc Thus, adding more targets and placing them 
optimally allows the system to suppress output noise and 
approach ever more closely the fundamental limits set by 
the physics of diffusion. 

Although there are not so many direct physical mea- 
surements specifying the input/output relations of ge- 
netic regulatory elements, there are many systems in 
which there is evidence for 'tiling' of the concentration 
axis by a set of target genes, all regulated by the same 
transcription factor, along the lines predicted here [55] . 
For example, in quorum sensing by bacteria, the con- 
centrations of extracellular signaling molecules are trans- 
lated internally into different concentrations of LuxR, 
which acts as a transcription factor on a number of genes. 



and these can be classified as being responsive to low, 
intermediate and high levels of LuxR [62j . Similarly, 
the decision of Bacillus subtilis to sporulate is controlled 
by the phosphorylated form of the transcription factor 
SpoOA, which regulates the expression of ^ 30 genes as 
well as an additional 24 multi-gene operons |63:. For 
many of these targets the effects of SpoA~P are direct, 
and the sensitivity to high vs low concentrations can be 
correlated with the binding energies of the transcription 
factor to the particular promoters [64]. In yeast, the 
transcription factor Pho4 is a key regulator of phosphate 
metabolism, and activates targets such as pho5 and pho84 
at different concentrations [5S] . All of these are potential 
test cases for the theoretical approach we have outlined 
here (each with its own complications), but a substan- 
tially new level of quantitative experimental work would 
be required to test the theory meaningfully. 



The classic example of multiple thresholds in the acti- 
vation of genes by a single transcription factor is in em- 
bryonic development H]. In this context, spatial gra- 
dients in the concentration of transcription factors and 
other signaling molecules mean that otherwise identical 
cells in the same embryo experience different inputs. If 
multiple genes are activated by the same transcription 
factor but at different thresholds, then smooth spatial 
gradients can be transformed into sharper 'expression do- 
mains' that provide the scaffolding for more complex spa- 
tial patterns. Although controversies remain about the 
detailed structure of the regulatory network, the control 
of the 'gap genes' in the Drosophila embryo by the tran- 
scription factor Bicoid seems to provide a clear example 
of these ideas [1 [Ml EOl ED EliZS] • Recent experimental 
work [TB] [TT] suggests that it will be possible to make ab- 
solute measurements of (at least) Bicoid concentrations, 
and to map the input/output relations and noise in this 
system, holding out the hope for more quantitative com- 
parison with theory. 



Finally, we look ahead to the more general problem in 
which multiple target genes are allowed to interact. Ab- 
sent these interactions, even our optimal solutions have 
a strong degree of redundancy — as the different targets 
turn on at successively higher concentrations of the in- 
put, there is a positive correlation and hence redundancy 
among the signals that they convey. This redundancy 
could be removed by mutually repressive interactions 
among the target genes, increasing the efficiency of infor- 
mation transmission in much the same way that lateral 
inhibition or center-surround organization enhances the 
efficiency of neural coding in the visual system [33, 35]. 
It is known that such mutually repressive interactions ex- 
ist, for example among the gap genes in the Drosophila 
embryo [74] . The theoretical challenge is to see if these 
observed structures can be derived, quantitatively, from 
the optimization of information transmission. 
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